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Abstract 

We consider the problem of large-scale inference on the row or column variables of 
data in the form of a matrix. Often this data is transposable, meaning that both the 
row variables and column variables are of potential interest. An example of this sce- 
nario is detecting significant genes in microarrays when the samples or arrays may be 
dependent due to underlying relationships. We study the effect of both row and column 
correlations on commonly used test-statistics, null distributions, and multiple testing 
procedures, by explicitly modeling the covariances with the matrix- variate normal dis- 
tribution. Using this model, we give both theoretical and simulation results revealing 
the problems associated with using standard statistical methodology on transposable 
data. We solve these problems by estimating the row and column covariances simultane- 
ously, with transposable regularized covariance models, and de-correlating or sphering 
the data as a pre-processing step. Under reasonable assumptions, our method gives 
test statistics that follow the scaled theoretical null distribution and are approximately 
independent. Simulations based on various models with structured and observed covari- 
ances from real microarray data reveal that our method offers substantial improvements 
in two areas: 1) increased statistical power and 2) correct estimation of false discovery 
rates. 

Keywords: multiple testing, false discovery rate, transposable regularized covariance 
models, large-scale inference, covariance estimation, matrix-variate normal, empirical 
null 

1 Introduction 

As statisticians, we often make assumptions when constructing a model to ease computa- 
tions or employ existing methodologies. When analyzing matrix data, we often assume that 
the variables along one dimension (say the columns) are independent, allowing us to pool 
these observations to make inferences on the variables along the other dimension (rows). In 
microarrays, for example, it is common to assume that the arrays are independent observa- 
tions when computing test statistics, allowing us to assess differential expression in genes. 
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Since we are testing many row variables (for example, genes) simultaneously, we commonly 
correct for multiple testing using procedures that theoretically are known only to control 
error measures when the row variables are independent or follow limited dependence struc- 
tures. Thus, for inference with matrix data, we often make assumptions of independence 
or limited dependencies among the row variables and among the column variables to be 
able to employ existing statistical methodologies. What if these assumptions are incorrect? 
What if this matrix data is in fact transposable, meaning that potentially both the rows 
and/or columns are correlated? 

In this paper, we consider the problem of testing the significance of row variables in a 
data matrix where there are correlations among the rows, or among the columns, or among 
both. We study the behavior of standard statistical methodology on transposable data and 
then propose a method to directly account for the dependencies when conducting inference. 

Throughout this paper, we often refer to the example of detecting genes that are differ- 
entially expressed between two classes in microarray data. These genomic datasets contain 
complicated correlation structures. Genes in similar pathways, for example, are usually 
highly positively correlated. Other genes may encode proteins that act as inhibitors lead- 
ing to negative correlations. In the analysis of microarrays, it is common to assume that 
the arrays are independent. Many have suggested, however, that this may not be correct 



( Efron 


2009 


Leek and Storey 


2008; 


Owen] [2005, 


Qiu et al. 


2005) 



process or latent variables. Arranged in the form of a matrix, this means that both the row 
(gene) and column (array) variables could be dependent, indicating that the data could be 
transposable. 

While we focus on the example of detecting significant genes in the two-class microarray, 
our methods can be applied to many examples of large-scale inference with transposable 
data. These include: testing the significance of proteins, genes, or isoforms in data such 
as protein arrays and next-generation sequencing data, testing the significance of voxels in 
functional magnetic resonance imaging data, and testing the significance of biomarkers in 
three-way data where measurements are taken on multiple subjects at several time points 
or in many different laboratories. In all of these examples, the assumptions of independence 
along one dimension of the data is questionable. 

We begin by introducing two examples that we will refer to throughout this paper. 
The first is a two-class microarray study of cardiovascular disease (Efron, 2009). We will 
refer to this as the "Cardio" data. This data has m = 20, 426 genes and n = 63 arrays 
consisting of 44 controls and 19 diseased patients. The second is a two-class microarray 
study of two types of Leukemia cancer (Golub et al. 1999), which we will refer to as the 
"Leukemia" data. This data has m = 3, 701 filtered genes and n = 72 arrays with 25 and 47 
samples in each subtype. For each of these datasets, we calculate the two-sample i-statistic 
for each gene and compare their distribution to that of the theoretical null distribution in 
Figure [T] We see that the i-statistics are over-dispersed compared to their theoretical null 
distributions. This could be due to the highly correlated nature of the thousands of genes, 
or another cause could be correlations among the arrays. In fact, the permutation tests of 



Efron (2009) reject the null hypothesis of independent arrays for both of these microarrays. 
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Figure 1: Histograms of two-sample t-statistics for the "Cardio" data (left) and the "Leukemia" data 
(right). Log intensity values were used with the genes and arrays centered. The theoretical null distribution, 
the t- distribution with 70 degrees of freedom (left) and 61 degrees of freedom (right), is drawn in red. 



When studying inference with transposable data, the effects of row and column corre- 
lations must be considered separately. Since the columns are generally considered to be 
independent, population column correlations lead to the used of incorrect test statistics 
and null distributions which in turn result in problems when correcting for multiple test- 
ing. Row correlations lead to the much discussed problem of multiple testing dependence 



et al. 2004). 



(Benjamini and Yekutieli 2001 Hommel, 1986 Leek and Storey, 2008 Sarkar, 2008; Storey 



We propose to study and solve these problems by modeling row and column correlations 



using the mean-restricted matrix- variate normal distribution (Allen and Tibshirani 2010) 



described in Section [2j The first half of our paper is devoted to studying the effects of 



these correlations on test statistics and their theoretical null distributions, Section 2.2 



and on power and multiple testing procedures through a simulation study in Section |3.2 
Interestingly, this study finds the following results. 

1. Unanticipated column correlations dramatically alter the null distributions of test 
statistics leading to the use of incorrect test statistics, null distributions and estimates 
of the FDR. 

2. Row correlations do not seem to affect the estimates of the FDR. 

The later half of our paper is focused on solving the problems associated with row 
and column correlations by directly making use of the correlation structure. In Section 
[4j we simultaneously estimate row and column covariances using transposable regularized 
covariance models (Allen and Tibshirani 2010). We then present an algorithm to sphere 



or de-correlate the rows and columns so that they are approximately independent. This 
algorithm is to be used as a pre-processing step and in conjunction with standard multiple 
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testing procedures. Simulation results using our sphering algorithm are presented in Sec- 
tion [5] under various models on both structured covariance and real microarray covariance 
examples. These reveal two important results: 

(c) Sphering can alter the rank of the test statistics leading to an ordering with higher 
statistical power. 

(d) Sphering often leads to substantial improvements in the estimation of the FDR. 
We conclude with a discussion of our study and methods in Section [6j 



2 Theoretical Framework 

In this section, we first present a matrix decomposition model based on the mean-restricted 



matrix- variate normal in Section 2.1 Then, going back to the two-class microarray example, 
we consider the test statistic for a single gene. Since the arrays are usually assumed to 
be independent, the two-sample z and t-tests are used commonly to assess differential 
expression. We give the theoretical null distributions for these test statistics under our 
model with column correlations in Section [2.21 



2.1 Model 



We propose to study row and column correlations through a simple matrix decomposition 
model based on the matrix- variate normal. We motivate the use of this distribution through 
the example of microarrays. 

In microarray data, the genes are often assumed to follow a multivariate normal distri- 
bution with the arrays independent and identically distributed. Since we aim to study the 
effects of array correlations, we need a parametric model that has the flexibility to model 
either array independence or various array correlation structures. To this end, we turn to 



the mean-restricted matrix- variate normal introduced in Allen and Tibshirani| (2010). (We 



also note that Efron (2009) proposes the matrix- variate normal as a model for microarrays). 
This distribution, denoted as X ~ N m ^ n {v, fj,, S, A), has separate mean and covariance pa- 
rameters for the rows, v G 3f? m and T, G 3f? mxm , and columns, /i£ S" and A G $t nxn . Thus, 
we can model array correlations directly though the covariance matrix A. If the matrix 
is transformed into a vector of length np, we have that vec(X) ~ iV(vec(M), fi), where 
M = + l( m ) an d f2 = A (g> £. Also, the commonly used multivariate normal is a spe- 
cial case of the distribution. If A = I and fj, = 0, then X ~ N{v, £). In fact, all marginal 
models of the matrix-variate normal are multivariate normal, meaning that both the genes 
and arrays separately are multivariate normal. Further properties of this distribution are 
given in Allen and Tibshirani (2010). 

In our matrix decomposition model, we will assume that the data, X, has m rows and 
n columns. We define the overall row means as v G 3ft m and column means as fi G 3ft" - . The 
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covariance of the rows is S S gfj mxm anc j ^he covariance of the columns is A £ 5R raxn . Then, 
we decompose the data into a mean, signal, and correlated noise matrix as follows. 



~i~ S m xrt + Nr, 



(1) 



where M = v\^ n ^ + (mean matrix), 

S is problem specific (signal matrix), 
N ~ N m>n (0, 0, S, A) (noise matrix). 



N m ^ n {v, /x, S, A), meaning that after removing the signal, the data follows 



Thus, X - S 

a mean-restricted matrix- variate normal distribution. 

For the example of the two-class microarray, we let there be n\ arrays in class one, with 
indices denoted by C\, and n>2 in class two, C<i. (For simplicity of notation, we assume that 
the first ni arrays are in class one and the last ri2 arrays are in class two.) The class signals, 
or the gene means for each class are defined as 6 3ft m and ip2 £ Then, the signal 
matrix, S, can be written as follows. 

V>iif ni) ^ 2 if n2) 

There are several remarks to make regarding this model. First, prior to analyzing data, 
it is common to standardize the rows. Sometimes this two-way data is doubly- standardized, 



or both the rows and columns are iteratively scaled (Efron, 2009; Olshen and Rajaratnam 



2010). Here, we center both the rows and columns through the mean matrix M, but do 



not directly scale them. Instead, we allow the diagonals of the covariance matrices of the 
rows, SI, and columns A, to capture the differences in variablities. Thus, our model keeps 
the mean and variances separate in the estimation process. 



2.2 Null Distributions: The Two-Class Problem 

In this section, we study the effect of column correlations on the theoretical null distribution 
of two-sample test statistics computed for a single row of the data matrix. More specifi- 
cally, we calculate the distributions of test statistics under our matrix decomposition model 
instead of the typical two-sample framework where samples are drawn independently from 
two populations. This corresponds to considering a single test for differential expression of 
gene i between the two classes. 

In the familiar two-sample hypothesis testing problem, we have a vector x = [xi X2] 

with xi of length n\ and X2 of length ri2 where the elements of each vector are x%i l ~ N(ipi, a 2 ) 

and X2 : i *~ N{ip2-, ' 2 )- We wish to test whether there is a shift in means between the two 
classes, namely 

H : ipi = tp 2 vs. Hi : ipi ^ ip 2 . (2) 

Throughout this paper, we will assume that the variances a 2 are equal between the two 
classes, a common assumption in microarrays. 
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If the variance, o~ is known, we have the familiar two-sample Z-statistic, 
Z = ^^, withZ-jV ^ 1 -^ ,! 



where Xh = — y^,? k i Xj and c„ = — + — . 

Now, going back to our matrix decomposition model, we wish to know the distribution 
of the Z-statistic for each row when there are column correlations. 

Theorem 1 Let x = [xi x 2 ] ~ N 1>n (0, [^il(m) ^M(n 2 )] ; cr 2 , A) . T/ien, 

Z-Jvf ^-j 2 ,^ (3) 

A U ( 1 1 V 

where r\ = — Ljj Ljj J and L is i/ie matrix square root of A.. 

i=i \ ni ieCi 722 iec 2 / 

In terms of the decomposition ([!]) , the assumptions of Theorem [T] correspond to a row 
vector previously centered by v and fi, with signal [V>il( m ) V'2l( n2 )]) column covariance 
A, and row variance cr 2 , the diagonal element of S. For microarrays, the result states 
that when the columns (arrays) are correlated, the variance of the Z-statistic is inflated or 
deflated by ?y, a function of the column covariance. Notice that if A = I, 77 = c n and the 
variance of Z is one. If there is only column correlation within the two classes we have the 
following result. 

Corollary 1 Assumex = [xi x 2 ] with^i ~ -/Vi, ni (0, ^il( ni ), a 2 , Ai) andx 2 ~ iVi in2 (0, ^ 2 l(n 2 )> ^ ^2) 
smc/i i/iai Cov(x) = A = ( ^ 1 ^ | , i/ien 

V a 2 y 

Z^ N (^tl,Hl±Jl) (4) 

n k / n k \ 2 n fc n fe 

where r/k = — ^ L/^j J = — j ^^y^A^ /or fc = 1,2 and is f/ie matrix 

k i = \ \ j—\ I k i = \ j = i 

square root o/Afc. 

In both of the previous results, we assumed that the row variance, a 2 was known. 
However, in most microarray experiments this is not known and must be estimated. With 
a 2 unknown, for testing the hypothesis ([2]), the two-sample i-statistic is used. 

xi- x 2 n2 _ Eie^c 1 ( x i,i- s i) 2 + E l ec 2 ( x 2,i-^2) 2 



s xi,x 2 — Z 7~Z o \°> 



1,2 nx+n 2 -2 
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Figure 2: Comparison of theoretical null distributions for the two-sample Z -statistic (left) and T -statistic 



(right) under various column covariance scenarios given in Section 2.2 Variances of the Z -statistics were 



calculated by the result in Theorem^ while the densities of the T -statistics were estimated via a simulation 
with one million replicates. 



with c n and Xk as previously defined. Under the null hypothesis, T ~ tr n _ 2 )> while under 
the alternative, T ~ t{$)(n-2)i a non-central t distribution with non-centrality parameter 
5 = C01 - ^ 2 )/(o-- v /c^). 

When there are column correlations as in the assumptions of Proposition [TJ however, the 
distribution of T does not have a closed form. (The square of the pooled sample standard 
deviation is no longer distributed as a Chi-squared random variable and the numerator and 
denominator of T are not independent.) Hence, we explore the effects of column correlations 
on the T-statistic through a small simulation study. Data is simulated according to the 
assumptions of Theorem [TJ with n = 50 columns with n\ = n 2 = 25 in each class. Four 
structured covariance matrices were used to assess the Z and T-statistics under the column 
correlations scenarios, as given below. 

• Ai : Ai,y = 0.9 |l " i! . 

• A2: Blocked diagonal with blocks of size 10. Within each block, A2,<j = 0.9' 1 "-''. 

• A 3 : A 3 ,y =0.5 I< - J ' 1 . 

• A4: Blocked diagonal with blocks of size 10. Within each block, A^jj = 0.5' 1- -''. 

Figure [2] reveals the effect of column correlations on the distributions of Z and T. We see 
that column correlations can cause dramatic over-dispersion of the test statistics compared 
to their theoretical null distribution. This is a possible explanation to the over-dispersion 
seen in the real microarray examples of Figure [TJ Compared to the variance of the Z- 
statistic, the T-statistic appears to be even more affected by column correlations. This is 
confirmed in Table[T]where we present the variances of the Z-statistic calculated by Theorem 
[TJ and the variances of the T-statistic estimated by Monte Carlo simulation. Indeed, small 
amounts of correlation in the columns can cause a dramatic increase in the variance of the 
T-statistic. 

In this section, we have shown how the distribution of T and Z-statistics behave when 
columns or arrays are correlated. When analyzing microarrays, however, many have advo- 



cated using a non-parametric method, estimating the null distribution by permutations Ge 



8 



G. I. Allen & R. T ibshirani 





Var(Z-statistic) 


Var(T-statistic) 


Ai 


9.215 


19.94 (0.029) 


A 2 


6.069 


9.492 (0.0144) 


A 3 


2.76 


3.197 (0.00472) 


A 4 


2.45 


2.79 (0.00411) 



Table 1: Variances of the two-sample Z and T -statistics under various column correlation scenarios as 
given in Section \2.2\ Variances of the T -statistics were estimated via simulation with one million replicates. 
The theoretical variance of the Z -statistic should be one, and 1.022 for the T -statistic. 



et al. (2003); Storey and Tibshirani (2003); Tusher et al. (2001). For the two-class microar- 



ray, one would permute the class labels and calculate the T-statistic for each permutation. 
These permutations form a null distribution, as under the null hypothesis Q, the class 
means are the same. Thus, each permutation of the labels is equally likely. When the ar- 
rays are correlated, however, this assumptions fails. Each permutation of the columns is not 
equally likely under the null due to the array covariance structure. While we do not explore 
the behavior of the permutation nulls further in this section, we include permutation-based 
methods from Storey and Tibshirani (2003) in our simulation study in the following section. 



3 Study: Dependence and Multiple Testing 

In the previous section, we presented the theoretical null distributions of commonly used 
test statistics for a single two-sample test statistic when the columns are correlated. With 
transposable data, however, one needs to test possibly tens of thousands of row variables, 
thus creating a problem of multiplicity. In this section, we first review some multiple testing 
procedures that are known to control errors under certain types of dependencies. We then 
present a series of simulations to study the behavior of commonly used multiple testing 
procedures when the rows and columns are correlated. 



3.1 Background 



A common error measure for controling the number of false positives in microarrays is the 
False Discovery Rate (FDR). This is the expectation of the False Discovery Proportion 
(FDP): let V be the number of false positives and R be the total number of rejections, then 
q = FDR = E(V/R\R > 0). Typically, investigators seek to control the FDR at q = 0.1, 
meaning that on average 10% of rejections are false. 



The step-up method of Benjamini and Hochberg (1995) is one of the most widely used 



methods for controling the FDR. Benjamini and Yekutieli (2001) have shown that this 



method controls the FDR under types of positive dependence, specifically positive regres- 



sion dependence, and Sarkar (2008) has relaxed this assumption to slightly broader forms 



of positive dependence. This may not be appropriate for all types of transposable data, 
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especially microarrays where we expect some negative correlations between genes. Alter- 
natively, Benjamini and Yekutieli (2001) have shown that dividing the thresholds in the 
step-up procedure by a constant controls the FDR under arbitrary dependencies. 

Another commonly used method to control the FDR is based on re-sampling or per- 



mutation distributions (Ge et al. 2003 Storey, 2002 Tusher et al. 2001; Yekutieli and 



Benjamini, 1999). Theoretically, these methods are only known to control the FDR asymp- 



totically under types of weak dependence, which encompasses forms of local dependence 
such as finite blocks (Storey et al. 2004). Thus, there could be many transposable data 
sets in which the row variables do not satisfy these dependence structures. (We also note 
that applying the step- up method to the permutation- adjusted p- values is equivalent to the 
direct FDR estimation via re-sampling (Storey et al. 2004| )). 



Also, to directly account for correlations, Efron (2004, 2007) proposed a method to fit 
an empirical null to the data. One can then estimate the local FDR and then the FDR by 
averaging the local FDR over the tail regions. 



3.2 Simulation Study 



We study the effects of both row and column correlations on standard statistical methodol- 
ogy used for large-scale inference through a simulation study based on our matrix decom- 
position model. We compare FDR estimates of four types of FDR-controlling procedures to 
the true false discovery proportion (FDP). The four methods we compare are the step-up 



dependence of 



and Tibshirani 



method of Benjamini and Hochberg (1995), the step- up method for control under arbitrary 



Benjamini and Yekutieli (2001), the permutation-based method of Storey 



(2003), and the method based on the empirical null and local FDR's of 



Efron (2007). The two-sample i-statistic was used for all methods with p- values computed 
by comparing it to the i( n _2) distribution for the step-up procedures. We used 1000 permu- 
tations for the permutation-based method. The defaults in the localfdr package available 
on CRAN, the R language repository. These defaults fit the null distribution as a natural 
spline with seven degrees of freedom, for the empirical null-based method. 

Our simulation study is structured as follows. The data is simulated under the matrix 
decomposition model ([TJ and is of size 250 by 50. The first 50 rows are non-null with a two- 
class signal matrix given by Vi,i:25 = 0.5, ^1,26:50 = -0.5, -02,1:25 = -0.5, -02,26:50 = 0.5 and 
the last 200 elements of tpi and ip2 equal to zero. We consider two types of row covariances, 
Si with all positive correlat i ons sa tisfying the positive regression dependence assumption 
of (Benjamini and Yekutieli 2001), and X2 with both positive and negative correlations. 
Both of these row covariances are block diagonal. We simulate data under three column 
covariances, with the first being the identity, or no column correlations. The others, Ai 
and A2 reflect a local and a class effect, respectively. These simulation covariances are 
summarized below. 

• Si: Blocked diagonal with blocks of size 10. Within each block, £1,^ = 0.9' l ~ J '. 

• £2: Blocked diagonal with blocks of size 10. Within each block, £2,ij = (— 0.9)' 1- -''. 

• Ai: Blocked diagonal with blocks of size 10. Within each block, Ai,ij = 0.5' 1- -''. 

• A2: Blocked diagonal with blocks of size 25. Within each block, A2,jj = 0.5' 1- -''. 
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(b) E 2 , A = I 





Number Rejected Tests 



Number Rejected Tests 
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(0 la, A 2 
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Number Rejected Tests 



Number Rejected Tests 



True False Discovery Proportion (FDP) 
Empirical null-based method (Etron. 2007) 
Permutation-based method (Storey and Tibshirani. 2003) 
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Figure 3: Simulation Study FDR Curves: The true and estimated false discovery proportions plotted 
against the number of tests rejected for each of the six simulations. All data was simulated under the matrix 
decomposition model, with parameters given in Section 3.2 



We present plots of the true FDP verses the number of hypotheses rejected for the four 
methods for one realization of each of the six simulation scenarios in Figure |3j We note 
that the lines above the true FDP curve denote conservative FDR estimates. In Table [2j 
we report the true and estimated false discovery proportions (FDP) when fixed numbers 
of hypotheses are rejected (40, 45, 50, 55 and 60 tests). Results are averaged over ten 
simulations with the standard error also reported. 

This simulation study reveals several interesting results. First, dependencies among the 
rows do not seem to effect FDR estimation with the four multiple testing procedures. When, 
A = I as in simulations (a) and (b), the methods generally conservatively estimate the true 
FDP. This is noteworthy since besides the method of Benjamini and Yekutieli (2001 ), there 
are limited theoretical results supporting FDR control under various dependencies. 

When there are even moderate correlations between columns, simulations (c) through 
(f), the four methods give poor estimates of the FDR. The step- up method and the 
permutation-based method perform similarly. They both give extremely conservative esti- 
mates of the FDP when there is either a local or a class effect among the columns. Thus, 
when using these methods for controlling the FDR at q = 0.1, for example, one would reject 
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True FDP 


(Benjammi & (Benjammi & (storey Hi . onn7"i 
Hochberg, 1995) Yekutieli, 2001) Tibshirani, 2003) ( T ° n ' ' 


(a) £i,A = I 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.0725 (0.033) 
0.104 (0.034) 
0.124 (0.033) 
0.169 (0.025) 
0.222 (0.02) 


0.0723 (0.022) 0.387 (0.091) 0.0742 (0.022) 0.257 (0.068) 
0.103 (0.026) 0.545 (0.11) 0.105 (0.026) 0.286 (0.07) 
0.147 (0.028) 0.715 (0.1) 0.149 (0.029) 0.32 (0.069) 
0.19 (0.03) 0.823 (0.092) 0.193 (0.03) 0.344 (0.066) 
0.255 (0.036) 0.891 (0.069) 0.258 (0.036) 0.376 (0.063) 


(b) £ 2 ,A = I 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.035 (0.017) 
0.0711 (0.019) 
0.12 (0.018) 
0.167 (0.016) 
0.217 (0.014) 


0.0498 (0.0081) 0.304 (0.049) 0.0513 (0.0082) 0.161 (0.034) 
0.0839 (0.012) 0.512 (0.076) 0.0856 (0.012) 0.209 (0.041) 
0.141 (0.021) 0.771 (0.099) 0.143 (0.021) 0.249 (0.045) 
0.191 (0.029) 0.822 (0.094) 0.192 (0.029) 0.278 (0.04) 
0.243 (0.035) 0.867 (0.074) 0.246 (0.035) 0.311 (0.041) 


(c) Si.Ai 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.0075 (0.0053) 
0.0222 (0.0099) 
0.056 (0.016) 
0.111 (0.011) 
0.178 (0.0083) 


0.0588 (0.019) 0.329 (0.087) 0.0578 (0.019) 0.0152 (0.0053) 
0.134 (0.037) 0.599 (0.11) 0.133 (0.038) 0.0511 (0.025) 
0.245 (0.052) 0.847 (0.087) 0.247 (0.053) 0.0858 (0.032) 
0.426 (0.04) 1 (0) 0.43 (0.041) 0.139 (0.032) 
0.579 (0.052) 1 (0) 0.585 (0.053) 0.167 (0.028) 


(d) S 2 ,Ai 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.005 (0.005) 
0.0111 (0.005) 
0.042 (0.0081) 
0.109 (0.0086) 
0.178 (0.0056) 


0.0455 (0.0065) 0.277 (0.04) 0.0439 (0.0065) 0.00846 (0.0041) 
0.111 (0.027) 0.58 (0.09) 0.11 (0.026) 0.0198 (0.0076) 
0.225 (0.046) 0.869 (0.082) 0.225 (0.047) 0.0493 (0.014) 
0.404 (0.034) 1 (0) 0.409 (0.034) 0.0923 (0.017) 
0.552 (0.048) 1 (0) 0.554 (0.048) 0.133 (0.018) 


(e) £i,A 2 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.0125 (0.0077) 
0.0333 (0.015) 
0.078 (0.015) 
0.135 (0.012) 
0.198 (0.011) 


0.0831 (0.018) 0.476 (0.09) 0.0783 (0.019) 0.0749 (0.024) 
0.164 (0.031) 0.746 (0.097) 0.16 (0.032) 0.117 (0.032) 
0.281 (0.027) 0.969 (0.031) 0.276 (0.028) 0.165 (0.032) 
0.368 (0.028) 1 (0) 0.364 (0.028) 0.194 (0.028) 
0.461 (0.044) 1 (0) 0.458 (0.045) 0.234 (0.028) 


(f) £ 2 ,A 2 

40 tests 
45 tests 
50 tests 
55 tests 
60 tests 


0.0075 (0.0053) 
0.0311 (0.011) 
0.078 (0.012) 
0.144 (0.013) 
0.198 (0.013) 


0.0712 (0.016) 0.407 (0.073) 0.066 (0.015) 0.0444 (0.012) 
0.169 (0.022) 0.855 (0.072) 0.163 (0.021) 0.087 (0.019) 
0.277 (0.025) 0.99 (0.0095) 0.271 (0.025) 0.132 (0.021) 
0.388 (0.037) 1 (0) 0.383 (0.037) 0.167 (0.019) 
0.47 (0.044) 1 (0) 0.468 (0.043) 0.197 (0.02) 



Table 2: Simulation Study: The effect of row and column correlations on estimation of the false discovery 
rates. The true false discovery proportion (FDP) and estimates with standard errors using the step-up, step- 
up for dependence, permutation, and empirical null based methods, as described in Section \3. 2\ are given 
when a pre-specified number of tests are rejected. All simulations were done using the matrix decomposition 
model, Q, with parameters given in Section 3.2 and repeated ten times. 



less than 45 genes, when in reality one should be permitted to reject around 55 genes. We 
also see that while the method of Benjamini and Yekutieli (2001 ) controls the FDR are ar- 
bitrary dependencies, in practice this method is much too conservative for general use. On 
the other hand, the empirical null-based method of Efron (2007) performs inconsistently. 
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Overall, the results of this simulation study reveal that dependencies among rows do 
not seem to effect the performance of the multiple testing procedures. On the other hand, 
the theoretical results of Section 2.2 are confirmed: dependencies among the columns are 
extremely problematic when conducting large-scale inference. 



4 De-Correlating a Matrix 

In the previous sections, we have presented theoretical and simulation results demonstrating 
some of the problems with using standard statistical methodology for making inferences on 
transposable data. In the remainder of this paper, we present a solution to these problems 
by directly estimating the covariances and using these to sphere or de-correlate the data. 

The key to our method, based on the matrix decomposition model 0, is the simul- 
taneous estimation of the row and column covariances. This is important because of the 
close relationship between the observed row and column covariances. Take, for example, the 
empirical covariances of a centered data matrix X, 5] = XX T jm and A = X r X jn. If we 
take the singular value decomposition, X = U D V T , then S = UD U T and A = V D V T , 



i.e. the two covariances estimates share the same eigenvalues. In fact, Efron (2009) shows 
that the variance of the two correlation matrices is the same. Because of this, population 
correlations among the rows, for example, often make the columns seem correlated. Thus, 
estimating the column covariance without accounting for the covariances of the rows is 
problematic. With Transposable Regularized Covariance Models, we can estimate both 5] 
and A simultaneously according the the matrix- variate normal framework. We review these 
models and discuss their relevance for the example of microarrays in the next section. 

4.1 Review: Transposable Regularized Covariance Models 

The Transposable Regularized Covariance Model (TRCM) allows us to estimate a non- 
singular row and column covariance matrix by maximizing a penalized log-likelihood of 



the matrix- variate normal distribution (Allen and Tibshirani, 2010). The model places a 



strictly convex penalty on the inverse covariances, or concentration matrices, of the rows 
and columns. For estimating the covariances in this context, we propose to use a sparsity- 
inducing penalty, an L\ penalty, on the concentration matrices. Following from the matrix 
decomposition model, 0, if we let N be the noise matrix remaining after removing the 
means and the signal in the data, then the penalized log-likelihood is as follows. 

£(£, A) = -log| S" 1 | + — log| A" 1 | - -tr (XT 1 N A" 1 N T ) - Am|| E" 1 ||i - An|| A" 1 ||i 

(6) 

where || A" 1 ||i = Ya=i Sj=i I I an d A is a penalty parameter that must be estimated. 

We motivate the use of Q first by discussing practical considerations. As the columns 
of the data matrix are usually There are several advantages assumed to be independent, 
A = I, this should be our default position. By placing an L\ penalty on A" 1 , our model 
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encourages sparsity in the off-diagonal elements of A. Also, notice that we have one penalty 
parameter, A, that is modulated by the dimension of the rows and columns. (We note that 
the penalty parameter, A, can be selected by cross-validation). Thus, the evidence of a 
partial correlation among columns must be strong relative to the correlations among the 
rows for an off-diagonal element of A" 1 to be estimated as non-zero. Secondly,specifically 
for microarrays, it seems reasonable to assume that the covariances among the genes is 
sparse, since biologically genes are likely only to be correlated with genes in the same or 
related pathways. 

We also pause briefly to discuss the theoretical rationale for using L\ penalties, instead 
of, for example, L2 penalties. Recall that covariance solutions to the TRCM model with 
Li penalties have eigenvectors that are equal to the left and right singular vectors of the 
data (Allen and Tibshirani 2010). Thus, the singular vectors of the data would remain 
the same when sphering with these estimates. In high- dimensional settings, however, it 



is well established that eigenvectors of empirical covariances are inconsistent (Johnstone 



and Lu, 2004), and thus, sphering by the L2 covariance estimates seems ill-advised. While 



the consistency of L\ TRCM estimates has not been established, there are consistency 
results for multivariate covariance estimation with an L\ penalty. Rothman et al. (2008) 
show convergence of the multivariate covariance estimate in the Frobenius norm and more 
importantly for the correlation estimate in the operator norm which implies convergence 
of the eigenvectors ( El Karoui , 2008 ) . These results reveal some of the possible theoretical 
advantages of using L\ penalties to estimate the covariances. 



4.2 Sphering Algorithm 

Based on the matrix decomposition model, Q, we present a method of de-correlating or 
sphering the data so that the rows and columns are approximately independent. This 
sphered data can then be used with standard multiple testing procedures to identify signif- 
icant row variables. Given a data matrix X with m rows and n columns, we present our 
sphering algorithm in Algorithm [TJ 



Algorithm 1 Sphering Algorithm 



1. 


Estimate row and column 


means, 


v and fi forming M, and the signal matrix, S. 


2. 


Define the noise, N = X - 


-M-S 


Estimate row and column covariances of noise, XI 




and A via TRCM. 






3. 


Sphere the noise: N = X! 


^NA~ 


1 

2 . Form the sphered data matrix: X = S + N. 



The Sphering Algorithm simply estimates the means and signal according to the matrix 
decomposition model ((!]) and then estimates the correlation structure among the rows and 
columns in the remaining noise. The TRCM covariance estimates, 5] and A are used 

- -1/2 - -1 - —1/2 

to de-correlated the noise. Here, 5] is the matrix square root of S and A of 
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A. (We use the symmetric square root defined by the following. Let 5] = PAP T be 
the eigenvalue decomposition of X , then the symmetric matrix square root is given by 

XI = PA 1 / 2 ? 7 .) Adding the signal back into this sphered noise, we obtain X which we 
call the sphered data. One can use this de-correlated data to find significant row variables. 
Now, we investigate some of the theoretical properties of the sphered data for the two- 



class problem introduced in Section 2.2 The sphered data, X has the following properties. 



Proposition 1 LetX ~ N m , n (M + S, S, A) where M = ul T {n) +l [m) ^ T andS = [V>ilf ni) tM^,)] 
and let X be the sphered data given by Algorithm^ Then, 

(i) E(X) = S = [V'llfm) ^2l(n 2 )L 

(ii) X-S~ AT m>n fo,0,S,A N 



w/iere £ = £ 5 SS 5 ami A = A ^ A A 5 . 

Thus, the class signal remains the same between X and X, and the covariance structure 
is all that changes. By sphering the noise, the noise of each row in X becomes a linear 
combination of the noise in the other rows. 



We now study how sphering the data affects the Z and T statistics from Section 2.2 
First, the Z-statistic does not change with sphering. The numerator of both the Z and T 
statistic, x\ — x<i is given by ?/>i — ip2, the components of the estimated signal matrix S. The 
denominator of the T-statistic, namely s Xl)X2 , the estimate of the noise, however, changes 



with sphering. Recall that in Section 2.2, we discussed how the T-statistic does not have 
a closed form distribution when there are column correlations. After sphering the data, 
however, the T-statistic on the sphered data follows a scaled t distribution under certain 
conditions. This is given by the following result. 

Proposition 2 Let X ~ AT m>n (M + S, S, A) where M = vlT ) +l (m)J u T and S = hM^) ^MLj 

Let X be the sphered data given by Algorithm [7J and let the statistic T be the statistic for 
the i th row defined by ^ for the data X. Then under the null hypothesis Hq : ipx = ip2, 



ifA = I, 

oi v c n 

where c n = i + ~, rj = YJj=i Ei 6 Ci Li i ~ h SieC 2 Li j) and L is the matrix square 
root of A., o~i = Sjj and d{ = S«. 

Using our sphering algorithm, we obtain test statistics that follow known distributions 
when the sphered column covariance, A is the identity. If A is instead a diagonal matrix, 
then a simple scaling of the columns will give the above result. Notice that if the original 
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data, X, has no column correlations, A = I, and Ui = <7j, then T and T both follow a t 
distribution with n — 2 degrees of freedom. Thus, if the data originally follows the correct 
theoretical null distribution, then sphering the data does not change its null distribution. 
Also, if the sphered rows are independent, 5] = I, or approximately independent, then the 
statistics, T are independent or approximately independent. We also note that we can often 
assume that &i = <7j, thus eliminating that coefficient ratio from the distribution. This is 
especially a reasonable assumption if the rows are scaled prior to applying the sphering 
algorithm. 

Our results in Proposition [i] hold if the sphered column covariance A is the identity 
or diagonal. The TRCM model, however, estimates sparse penalized row and column co- 
variances. These penalized estimates will not capture the full covariances, but will instead 
estimate the major correlations. Thus, in practice, A and S are not likely to be exactly 
the identity. We have observed in simulations, however, that A is often diagonal or nearly 
diagonal, and thus the theoretical results are appropriate. 

When calculating p-values for f based on the distribution given in Proposition [2J we 
must know the value of n which depends on the original column covariance A. One could 
estimate this from A, but since the TRCM framework estimates penalized covariances, an 
estimate, f/, based on A will underestimate the population n. Hence to obtain the null 
distribution of the T-statistics, we have opted to scale T by the variance of the central 
portion of the observed distribution of the test statistics. This procedure is outlined in 



Algorithm 4.2 where p a (x) denotes the a quantile of x. and JQ is the indicator function. 



Algorithm 2 Scaling by the central portion of T. 



1. Let the expected proportion of null test statistics be ttq = rho/m. 

2. Estimate the variance of the central portion of sphered test statistics: 

o-J(tto) = Var % I (j- > p ((1 _^ ))/2 )(T), T < /9(i-* / 2 ) (T) 



3. Define the central-matched T-statistics: T* = T<7t (n _ 2) (^o)/<Xp(7To), where °f (n _ 2) (fl"o) 
is the variance of the central portion of the tr n _ 2 \ distribution. 



We scale by the central portion of the T-statistics so that the statistics can be tested 
against the t( n _2) distribution. Notice that if all of the test statistics are null and ttq = 1 
then, central-matching the variances reduces to scaling the T-statistics. Since under the 
assumptions of Proposition^ only the null T follow a scaled ^distribution, we do not want 
statistics corresponding to non-null tests to contaminate the variance estimates. Thus, we 
recommend using a conservative estimate of ttq, such as 0.8 or 0.9 for microarrays. 

By applying our sphering algorithm, we directly account for correlations among the rows 
and columns. This results in test statistics that more closely follow both their theoretical 
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nulls and the theoretical assumptions under which common multiple testing procedures are 
known to control the false discovery rate. 



5 Results 

We now evaluate the performance of our sphering algorithm through many simulated ex- 
amples. First, we compare data pre-processed by sphering to the standard row and column 
centering method on simulations based on the matrix decomposition model, ([!]). We use 
simulations from the matrix- variate normal with the structured covariances from the simu- 
lation study in Section |3.2| and also with covariances based on the observed dependencies in 
real microarray data. Finally, we test the robustness of our method and compare it to other 



methods for modeling dependencies in Section 5.2 For all simulations, the sphering algo- 
rithm was applied with the TRCM penalty parameter A selected by five-fold cross-validation 
and with statistics scaled by the central portion using ttq = 0.8. (Note that the "standard" 
pre-processing method refers to row and column centering throughout this section.) 

5.1 Simulations: Matrix- variate Model 

In all of these simulations, the data, X is simulated from the matrix decomposition model 
with m = 250 rows and n = 50 columns. The first 50 rows are non-null given by 
•01,1:25 = 0.5, ^1,26:50 = —0.5, 02,1:25 = —0.5, 02,26:50 = 0.5 and the last 200 elements of ipi 
and tp2 equal to zero. 

In Table [3j we present results on a subset of the simulations from our simulation study in 
Section [3. 2[ The remaining simulation study results are given in Appendix [A") The results in 
Table[3]show that de-correlating the data matrix yields improvements in 1) statistical power 
and 2) estimation of the FDR. We briefly illustrate this by examining a specific example 



from Table 3.2 Take the simulation with parameters Si, Ai and look at the results with 
55 tests rejected. We notice that the true FDP for the data pre-processed by the standard 
method is 0.111 whereas it is lower, 0.105, on the data that was sphered. This results from 
a favorable re-ordering of the test statistics that gives a higher statistical power, one minus 
the true FDP. Next, notice that the FDR estimates for the step- up and permutation-based 
methods are 0.426 and 0.43 respectively for the un-sphered data. These estimates are overly 
conservative, as the true FDP is 0.111. After sphering, however, the FDR estimates are 
0.124 and 0.125 which are much closer to the true FDP of 0.105. Hence, sphering also 
improves FDR estimation. 

Sphering the data as a pre-processing step to multiple testing procedures has many ad- 
vantages. First in microarrays, the higher statistical power that results from a re-ordering 
of test statistics is important to scientists who desire the top ranked genes from one mi- 
croarray study to translate to the top genes in another study. Also, while sphering leads to 
improvements in FDR estimation, it is still a slightly conservative estimate, as desired, for 
the true FDP. As with the simulation study, we find that the empirical null based method 



of Efron (2007) gives an inconsistent estimate of the FDR as it is both a conservative and 
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True FDP 


(Benjamini &i 
Hocnberg, 1995) 


r JJK. rjStimates 
(Storey & 

TVL.— l,j . ■ onflow 

libsmram, 2003) 


(Efron, 2007) 


Si,A = I 

40 tests 


0.0/25 (0.033) 
0.0333 (0.017) 


0.0(23 (0.022) 
0.0458 (0.019) 


0.0/42 (0.022) 
0.0452 (0.019) 


0.25/ (0.068) 
0.153 (0.075) 


45 tests 


0.104 (0.034) 
0.0469 (0.02) 


0.103 (0.026) 
0.0703 (0.025) 


0.105 (0.026) 
0.0705 (0.025) 


0.286 (0.07) 
0.173 (0.076) 


50 tests 


0.124 (0.033) 


0.147 (0.028) 
U.1U4 (^u.u^yj 


0.149 (0.029) 

U.1U5 (U.U29J 


0.32 (0.069) 

U.-iUY ^U.UYoJ 


55 tests 


0.169 (0.025) 
U.141 (U.UloJ 


0.19 (0.03) 
U.loo ^U.UooJ 


0.193 (0.03) 

U.loo (U.UooJ 


0.344 (0.066) 

U.^ol ^U.UbYj 


60 tests 


0.222 (0.02) 
U.194 ^U.U12j 


0.255 (0.036) 

U.Zoo (U.UooJ 


0.258 (0.036) 

U.Jo4 (U.UooJ 


0.376 (0.063) 

U.^o4 ^U.Uoy J 


40 tests 


U.UU/o (U.UUooJ 
0.00278 (0.0028) 


n nroo ni o\ 

u.uooo (u.uiyj 

0.00493 (0.0029) 


n ncTo / n nin\ 

u.uo/o (u.uiyj 

0.00469 (0.0029) 


U.Uloz (U.UUoo) 
0.0071 (0.0049) 


45 tests 


0.0222 (0.0099) 
0.00988 (0.0099) 


n lo-l (n no r 7\ 

0.134 (0.03/) 
0.0157 (0.0071) 


0.133 (0.038) 
0.0152 (0.007) 


0.0511 (0.025) 
0.0171 (0.0086) 


50 tests 


0.056 (0.016) 
0.0222 (0.011) 


0.245 (0.052) 
0.0438 (0.013) 


0.247 (0.053) 
0.0434 (0.013) 


0.0858 (0.032) 
0.0487 (0.018) 


55 tests 


0.111 (0.011) 
0.105 (0.0085) 


0.426 (0.04) 
0.124 (0.02) 


0.43 (0.041) 
0.125 (0.02) 


0.139 (0.032) 
0.118 (0.018) 


60 tests 


0.178 (0.0083) 
0.172 (0.0039) 


0.579 (0.052) 
0.199 (0.028) 


0.585 (0.053) 
0.201 (0.029) 


0.167 (0.028) 
0.146 (0.016) 



Table 3: A subset of the simulation study results: True false discovery proportions (FDP) and FDR 
estimates with standard errors are given when a pre-specified number of tests are rejected. Results using 
the sphering algorithm (in bold) are compared to data that has been row and column centered. All data was 
simulated under the matrix decomposition model, Q, with parameters given in Section 3.2 and repeated ten 
times. Two sets of values should be compared: the true FDP with sheering to without sphering, and the FDR 
estimates compared to the true FDP for both with and without sphering. 



liberal estimate for differing numbers of rejected tests. 

We also wish, however, to test the performance of our sphering algorithm on data with 
dependencies more similar to real microarray data. Thus, we build a second simulation study 
based upon the empirical covariances of the "Cardio" and the "Leukemia" microarrays. For 
each of the ten repetitions, we sample 250 genes and 50 arrays at random from each microar- 
ray. Let i G I and j £ J, be the sampled sets of the genes and arrays respectively, and as- 
sume X is the centered data matrix. We then calculate the empirical covariances, A.mle = 

YliexY,i'eT X T x i'/ m > ancl s mlb = Yjjej^j'ej X i X J' l n - The ^ ata is simulated from 
the mean-restricted matrix-variate normal with X ~ N([ipi ife] , 0, Smle, Amle)- Hence, 
the simulated data follows the observed covariance of the "Cardio" and "Leukemia" studies. 
Example images and FDR curves from this simulation are given in Figure [4] as well as the 
simulation results in Table |U 

The results of the structured covariance study, namely improvements in statistical power 
and FDR estimation, are confirmed on these microarray-based simulations. There are also 
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i 1 1 — i r 

20 40 60 80 
Number Rejected Tests 



1 — i — i — i — i — r 

20 40 60 80 100 
Number Rejected Tests 



Leukemia Sphered 

arrays 




Number Rejected Tests 



Number Rejected Tests 



True False Discovery Proportion (FDP) 
Empirical null-based method (Elron, 2007) 



Permutation-based method (Storey and Tibshirani, 2003) 
Step-up method (Benjamini and Hochberg, 1995) 



Figure 4: Example data images (top panel) and FDR curves (bottom panel) for the simulations based 
on dependencies within the "Cardio" and "Leukemia" microarrays. Data is either gene and array centered 
or sphered. In the FDR curves, the true and estimated false discovery proportions are plotted against the 
number of genes rejected. All data was simulated under the matrix decomposition model, |T]), with parameters 
given in Section\5. 1\ 



some specific notes to make regarding these simulations. First in Table |4j notice that using 
un-sphered data the FDR is overestimated on the "Cardio" simulations and underestimated 
on the "Leukemia" simulations. This is confirmed by the example giving the full FDR curves 
in Figure [4} After sphering, however, we see that the FDR estimates for both simulations 
are still conservative, but much closer to the true FDP. We note that all ten repetitions of 
the "Cardio" simulation estimated both 5] and A to be non-diagonal. This means that even 
after accounting for the correlations among the genes, there still appear to be significant 
correlations among the arrays. In the "Leukemia" simulation, however, A was estimated to 
be diagonal in all ten simulations. Thus, the correlations among the genes may be driving 
the over-dispersion seen in the i-statistic distributions of Figure [TJ 

Thus, from these simulations based on our matrix decomposition model, we see that 
sphering the data as a pre-processing step greatly improves statistical power and false 
discovery rate estimation. 
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True FDP 


r UK Estimates 
(Benjamini k, (Storey & on^7^ 
rlocnberg, lyyoj libsmram, zUUoJ 


"Cardio" 

40 tests 

45 tests 

50 tests 

55 tests 

60 tests 


0.U15 (0.01) 
0.00833 (0.0083) 


r\ n 7n /i /n m o\ n nono /n ai r\ n n o /in f'nm^X 
U.UY94 (U.Uloj U.UoUo (U.UloJ U.Uo4y (U.UloJ 

0.0311 (0.011) 0.0301 (0.011) 0.0469 (0.021) 


n noon /n r»i A \ 

0.0333 (0.014) 
0.0247 (0.013) 


0.144 (0.019) 0.146 (0.021) 0.0545 (0.021) 
0.0615 (0.02) 0.0597 (0.019) 0.0757 (0.031) 


0.068 (0.017) 
u.Uoyo (U.ulbJ 


0.294 (0.036) 0.295 (0.037) 0.0945 (0.024) 
U.lub yU.vjZo) U.1U4 {u.UZo) u.l y\j.\JZ\)) 


0.131 (0.013) 

n 1 1 c (n ni c) 
(U.U15J 


0.452 (0.068) 0.453 (0.068) 0.131 (0.025) 

nine / r\ r\o a\ nine (n hqq) aikq /n noo) 
U.iyb ^U.Uo4J U.iyo (U.Uooj U.loo (U.UoZj 


0.19 (0.011) 
u.iyi (u.uioj 


0.555 (0.072) 0.555 (0.072) 0.162 (0.022) 
U.^b ^U.U24J U.zoy (U.U^oj U.17o (U.U.J7J 


"Leukemia" 

40 tests 

45 tests 

50 tests 

55 tests 

60 tests 


n nt)7c: I r\ m c) 

U.Uofo (U.UloJ 
0.0375 (0.012) 


U.U777 (U.UUoo) U.U/o (U.<JUo4J U.zso (U.UooJ 
0.0477 (0.013) 0.0489 (0.013) 0.0928 (0.026) 


0.133 (0.023) 
0.0711 (0.016) 


niin/nni"^ niio/nr\in\ noni/nnc/iX 
0.119 (0.011) 0.112 (0.012) 0.324 (0.054] 

0.0771 (0.017) 0.08 (0.018) 0.124 (0.031) 


0.172 (0.021) 
0.122 (0.017) 


0.162 (0.015) 0.156 (0.015) 0.36 (0.048) 
0.129 (0.023) 0.132 (0.024) 0.159 (0.033) 


0.22 (0.019) 
0.178 (0.013) 


0.194 (0.019) 0.186 (0.02) 0.383 (0.046) 
0.183 (0.021) 0.186 (0.021) 0.191 (0.03) 


0.255 (0.015) 
0.223 (0.011) 


0.243 (0.024) 0.237 (0.024) 0.405 (0.043) 
0.235 (0.022) 0.24 (0.023) 0.213 (0.03) 



Table 4: Results for simulations based on observed dependencies within the "Cardio" and "Leukemia" 
microarrays: True false discovery proportions (FDP) and FDR estimates with standard errors are given 
when a pre-specified number of tests are rejected. Results using the sphering algorithm (in bold) are compared 
to data that has been row and array centered. All data was simulated under the matrix decomposition model, 
|T]), with parameters given in Section \5. 1\ and repeated ten times. Two sets of values should be compared: 
the true FDP with sheering to without sphering, and the FDR estimates compared to the true FDP for both 
with and without sphering. 



5.2 Simulations: Other Models 

We now evaluate the performance of our method using simulations based on models other 
than the matrix- variate normal, namely a latent variable model and a random effects model. 
In these simulations we will not only compare our sphering method to the standard method, 



but also to the surrogate variable analysis method of Leek and Storey (2008). We first 
compare this method and model's properties to our sphering algorithm and matrix decom- 
position model, and then compare these methods numerically. 



To account for possible latent variables in a multiple testing framework, |Leek and Storey 



(2008) propose a matrix model and the surrogate variable analysis (SVA) method. They 
propose the model for the data X G W nxn , X = B S +rG + U where S is a signal matrix, 
G G ffi dxn for d < n is the latent variable matrix, U G W nxn is independent noise and B 
and r G $t mxd are coefficients to be estimated. This model is similar in nature to our matrix 
decomposition model Q. If we assume X has been previously centered, using the notation 
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Standard 
FDP FDR 


Sphered 
FDP FDR 


SVA 
FDP FDR 


Latent Variable Model 














40 tests 


0.08 


0.168 


0.05 


0.0365 


0.0528 


0.0546 


(0.017) 


(0.023) 


(0.014) 


(0.0086) 


(0.01) 


(0.0045) 


45 tests 


0.109 


0.212 


0.0889 


0.0657 


0.0716 


0.0868 


(0.016) 


(0.022) 


(0.018) 


(0.013) 


(0.0097) 


(0.0092) 


50 tests 


0.15 


0.303 


0.124 


0.106 


0.127 


0.118 


(0.018) 


(0.035) 


(0.017) 


(0.021) 


(0.01) 


(0.015) 


55 tests 


0.189 


0.383 


0.167 


0.166 


0.17 


0.183 


(0.015) 


(0.051) 


(0.018) 


(0.021) 


(0.012) 


(0.02) 


60 tests 


0.24 


0.453 


0.215 


0.215 


0.217 


0.27 


(0.011) 


(0.05) 


(0.017) 


(0.023) 


(0.0099) 


(0.025) 


Random Effects Model 














40 tests 


0.375 


0.011 


0.0361 


0.0318 


0.4 


0.136 


(0.011) 


(0.002) 


(0.02) 


(0.012) 


(0.047) 


(0.042) 


45 tests 


0.433 


0.0161 


0.0642 


0.0863 


0.439 


0.169 


(0.0084) 


(0.0031) 


(0.026) 


(0.031) 


(0.042) 


(0.048) 


50 tests 


0.48 


0.0196 


0.102 


0.14 


0.475 


0.185 


(0.014) 


(0.004) 


(0.02) 


(0.033) 


(0.041) 


(0.047) 


55 tests 


0.52 


0.0229 


0.154 


0.207 


0.507 


0.213 


(0.013) 


(0.0044) 


(0.018) 


(0.037) 


(0.034) 


(0.053) 


60 tests 


0.553 


0.0288 


0.202 


0.309 


0.538 


0.235 


(0.012) 


(0.0054) 


(0.014) 


(0.048) 


(0.031) 


(0.053) 



Table 5: Simulations based on a latent variable and random- effects models as described in Section 5.2 The 



true FDP (FDP) is compared to the FDR estimates (FDR,) for a fixed number of rejected tests using the 
step-up method of Benjamini and Hochberg ,1995) for three pre-processing techniques: Standard (row and 



column centered), our sphering algorithm, and the surrogate variable analysis (SVA) method. Averages are 
taken on ten repetitions and standard errors are given. 



of the latent variable model, we can write as X = BS + S5 U As. Thus, our model 
accounts for structure within the data through the row and column covariances X and A, 
whereas their method estimates the structure through G and assumes the noise is additive. 
Assuming that the latent variable model or our model is correct, applying the respective 
algorithms results in approximately independent p- values. Also similar to our method, SVA 
can change the rankings of the test statistics. Unlike SVA, however, our model and sphering 
algorithm directly capture and account for possible correlations among the columns as well 
as the rows. 

We simulate data from a latent variable model taken directly from Leek and Storey] 
(2008) as well as from a random effects model denoting a batch effect. For both models, 
the data is of dimension 250 x 50, with 25 columns in each class with the following signal: 
The first 50 rows are non-null given by ^i,i : 25 = 0.5, ^1,26:50 = —0.5, ^2,1:25 = —0.5, 
^2,26:50 = 0.5 and the last 200 elements of ipi and ip2 equal to zero. For the latent variable 
model, there are two latent variables given by Gij l ~Bern(0.5), coefficients *~ iV(0, 1) 
and noise fjy ~ iV(0, 1). For the random-effects model with K batches indicated by indices 
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I(k), a column of the data is given by the following: X r j = v + \Xj + X^=i V'i-fy/eCi) + 
^2k=i /3fc^(je-T(fc))+ e J where u, fj,j and ipi are fixed effects, and *~ N(fik,^lX) independent of 

N(0, Si) are random effects. In our simulation, we have = [—0.5 —0.25 0.25 0.5], 
<t 2 = 0.5, X = Si as defined in Section 3.2 and I(k) indicating batches of five columns. 

In Table [5} we compare the true FDP to the estimate of the FDR via the step- up method 
(Benjamini and Hochberg, 1995) for the data with standard pre-processing, with sphering 
and with Leek and Storey (2008)'s surrogate variable analysis (SVA). The SVA method was 
implemented using the defaults available in the package sva from CRAN, the R language 
repository. For the latent variable simulation, both our sphering method and the SVA 
method improve the rank ordering of the test statistics resulting in higher statistical power 
as well as improved estimates of the FDR. In the random-effects model simulation, however, 
the sphering algorithm substantially outperforms the standard pre-processing and the SVA 
method. We illustrate this by looking at the specific case where 50 tests are rejected. For 
the standard pre-processing and SVA methods, the true FDP is 0.48 and 0.475 respectively, 
meaning that on average 25 out of the 50 rejected tests are false positives. With sphering, 
however, the order of the test statistics is dramatically changed leading to a true FDP of 
0.102 so that on average only 5 out of 50 rejected tests are false. The FDR estimates using 
the step-up method are also problematic for the standard and SVA methods as 0.0196 and 
0.185 are substantially below the true FDPs of 0.48 and 0.475 respectively. If these methods 
were used, the number of false positives would not be controlled. With sphering, however, 
the FDR estimate of 0.14 is much closer to the true FDP, 0.102 and is a conservative 
estimate, as desired. 

These simulations based on models other than the matrix-variate normal reveal the 
robustness of our pre-processing technique. Our sphering method also compares very favor- 
ably to the surrogate variable analysis, another pre-processing method. 



6 Discussion 

In this paper, we have demonstrated that using standard statistical methodology to conduct 
inference on transposable data is problematic. As a method of solving these problems, we 
have prosed a sphering pre-processing technique that de-correlates the data yielding approx- 
imately independent rows and columns. We have revealed the advantages and robustness 
of this method through simulations on many correlated data sets. 

The major disadvantage of our method is its computational cost. Fitting the transpos- 
able regularized covariance model with L± penalties is approximately 0(k(m 3 +p s )), where k 
is the total number of iterations needed until convergence. Thus, directly fitting this model 
to microarrays, for example, where m may be twenty or thirty thousand, is not currently 
feasible. A simple fix can be proposed, that is to first filter the genes by the absolute value 
of their un-sphered T-statistics down to say 1,000 or 500 genes. Since the signal in each 
gene remains the same before and after sphering, filtering should not effect the power to 
detect non-null genes, especially since researchers are rarely interested in re-testing over 500 
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genes. As future work, we will examine approximations to the TRCM covariance estimates 
that can be used in high-dimensional settings and would circumvent the need to filter the 
genes before sphering. 

There are many components of our work that deserve further investigation and testing. 
First, Allen and Tibshirani (2010) outline some of the properties of the TRCM covariances 
estimates, but several questions, such as the consistency of the estimates, remain. Also, 
direct estimation of 77, the scaled variance of the Z-statistic that depends on the array 
covariance, should be examined to find a consistent estimate of 77. 

In conclusion, our model and study have revealed several important issues related to 
large-scale inference with transposable data such as microarrays. First, correlation among 
the columns proves to be a major problem, both theoretically and in simulations, when 
comparing test statistics to a theoretical or permutation null distribution. This results is 
striking as inference is often conducted under the false assumption of column independence. 
Second, despite the lack of theoretical results supporting the use of many common FDR- 
controlling procedures for test statistics with arbitrary dependence structure, the procedures 
seem to conservatively estimate the FDP under a variety of correlation scenarios. Finally, 
our method of de-correlating the data is a way to directly model the covariance structure 
in a multiple testing framework. This method leads to 1) improvements in the statistical 
power, and to 2) better estimation of the FDR. While this paper has focused on the example 
of two-class microarrays, our model and methods may prove useful in a variety large-scale 
inference problems with highly transposable data sets. 
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A Additional Simulation Results 



B Proofs 

Proof 1 (Theorem []} Let z be a vector of N(0, 1) random variables. Then, if we arrange x as a 
column vector, we have 
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True FDP 


r uk Histimates 
(Benjamini & (Storey & , „ nnnv) 
riochberg, lyyo) libsniram, 2UUo) 


Si,A 2 

40 tests 

45 tests 
50 tests 
55 tests 
60 tests 


U.0125 (0.00/ ij 
0.0194 (0.013) 


U.Uool (U.UloJ U.U/oo (U.U19J U.UMy (U.U24) 
0.022 (0.0074) 0.0215 (0.0072) 0.0495 (0.029) 


r\ noon 1 r\ m r\ 

0.0333 (0.015) 
0.0321 (0.018) 


0.164 (0.031) 0.16 (0.032) 0.117 (0.032) 
0.0436 (0.011) 0.0431 (0.011) 0.0761 (0.03) 


0.078 (0.015) 

n nccc fn no\ 
U.Uooo \\j.\JJi) 


0.281 (0.027) 0.276 (0.028) 0.165 (0.032) 

r\ f\QQO f f\ m o\ n noTc /n m o\ nii'7fnnoi\ 
U.Uooo ^U.UloJ U.Uo75 (U.UloJ U.117 ^U.UolJ 


0.135 (0.012) 

n i oq fn m c\ 
U.Uo (U.U15J 


0.368 (0.028) 0.364 (0.028) 0.194 (0.028) 

n 1 qi /n no\ n 1 oo /n no) ni Tn / n noi \ 
U.lol yv.ijZ) U.lo^ (U.Uzj U.17y ^U.UolJ 


0.198 (0.011) 

n ~t c\a f n m a\ 
U.ly4 ^U.U14J 


0.461 (0.044) 0.458 (0.045) 0.234 (0.028) 

r\ Oyif) f r\ noo\ n o a a fn noo) n om fn noo\ 
yD.xJJ.o) U.244 (U.UzoJ U.^Ul ^U.Uooj 


£ 2 ,A = I 

40 tests 

45 tests 
50 tests 
55 tests 
60 tests 


U.Uoo (U.U17J 
0.03 (0.015) 


U.U49o (U.UUol) U.Uolo (U.UUoz) U.lol (U.Uo4J 
0.0279 (0.0058) 0.0279 (0.0055) 0.081 (0.029) 


0.0711 (0.019) 
0.0644 (0.017) 


n noon /n ai n\ i~i no rr' / r\ mo \ r\ or\r\ / n n ■! 1 \ 

0.0839 (0.012) 0.0856 (0.012) 0.209 (0.041) 
0.0607 (0.0096) 0.0604 (0.0094) 0.133 (0.035) 


0.12 (0.018) 

n nno /a ni o\ 
u.uyo (U.UloJ 


0.141 (0.021) 0.143 (0.021) 0.249 (0.045) 

r\ noon / o m q \ n nno e: ( i\ mo) n i ci f n no k\ 

u.uyoy ^u.uioj u.uyoo (u.uioj u.ibi ^u.uooj 


0.167 (0.016) 

n i a fn nnn\ 

U.14 (u.uuyj 


0.191 (0.029) 0.192 (0.029) 0.278 (0.04) 

n i kq f n ni g\ n i a f n ni c) n ono fn noK\ 
U.loo ^U.UlbJ U.lb (U.UlbJ U.^sUo ^U.UooJ 


0.217 (0.014) 

n i nT /a nncc\ 
U.lyY (U.UUboj 


0.243 (0.035) 0.246 (0.035) 0.311 (0.041) 

n oot fn ni v\ n oon fn ni t\ n O/io (n no a\ 
\\S.\3\.l) \S.Z2A3 (U.U17J ^U.Uo4J 


Sa,Ai 

40 tests 

45 tests 
50 tests 
55 tests 
60 tests 


U.UUo (U.UUo) 
(0) 


U.U455 (U.UUoo) U.U4«3y (U.UUoo) U.UUo4o (U.UU41) 
0.00305 (0.0014) 0.00267 (0.0012) 0.00185 (0.001) 


0.0111 (0.005) 
(0) 


0.111 (0.027) 0.11 (0.026) 0.0198 (0.0076) 
0.00845 (0.0031) 0.00783 (0.0029) 0.00656 (0.0026) 


0.042 (0.0081) 

n no fn nnci \ 
U.Uo (U.UUbi ) 


0.225 (0.046) 0.225 (0.047) 0.0493 (0.014) 

n nAoa fn nn*7£i\ n nAoo (n nn^c) n noo fn nnoo\ 
U.U4ob (^U.UU7bj U.U4oo (U.UU75J U.Uoo ^U.UUooJ 


0.109 (0.0086) 

0.09b4 (0.0039) 


0.404 (0.034) 0.409 (0.034) 0.0923 (0.017) 
U.118 (0.U14) 0.118 (U.015) 0.075b (0.014) 


0.178 (0.0056) 

n i / ri nni t\ 
U.lbo ^U.UU17 ) 


0.552 (0.048) 0.554 (0.048) 0.133 (0.018) 

n n a ( n ni a\ n 01 ti fn m a\ n 1 1 a fn ni k\ 
U.zl4 (^U.U14j U.21b (U.U14J U.114 (^U.Uloj 


S 2 ,A 2 

40 tests 

45 tests 
50 tests 
55 tests 
60 tests 


0.0075 (0.0053) 
0.0125 (0.0077) 


0.0712 (0.016) 0.066 (0.015) 0.0444 (0.012) 
0.0108 (0.0021) 0.0104 (0.0019) 0.0152 (0.0042) 


n noi i i c\ ni i \ 
U.0311 (0.011) 

0.0244 (0.0084) 


0.1d9 (0.U22) U.163 (0.021) U.Uer (0.U19) 
0.0317 (0.0056) 0.0309 (0.0057) 0.0336 (0.007) 


0.078 (0.012) 
0.072 (0.015) 


0.277 (0.025) 0.271 (0.025) 0.132 (0.021) 
0.092 (0.016) 0.0918 (0.016) 0.0738 (0.011) 


0.144 (0.013) 
0.136 (0.013) 


0.388 (0.037) 0.383 (0.037) 0.167 (0.019) 
0.162 (0.013) 0.162 (0.013) 0.116 (0.013) 


0.198 (0.013) 
0.202 (0.012) 


0.47 (0.044) 0.468 (0.043) 0.197 (0.02) 
0.223 (0.014) 0.223 (0.014) 0.143 (0.01) 



Table 6: Additional simulation study results: True false discovery proportions (FDP) and FDR estimates 
with standard errors are given when a pre-specified number of tests are rejected. Results using the sphering 
algorithm (in bold) are compared to data that has been row and column centered. All data was simulated 
under the matrix decomposition model, ([T]), with parameters given in Section 3.2 and repeated ten times. 
Two sets of values should be compared: the true FDP with sphering to without sphering, and the FDR 
estimates compared to the true FDP for both with and without sphering. 
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Thus, we can write Z as a sum of the scaled independent and normally distributed random variables z. The 
expected value of Z is trivial and the variance can be written as the following. 



Var(Z) = — — Var (xi - x 2 ) 
o A c n 



i r n ( i i 
- Var EkrE^-^E 1 



The last step follows since the Zj 's are independent. Note also that if we let Wi = ^ ni 1 
write n = Y,7=i £?=i WiWj. 



i e Ci 
iec 2 : 



Proof 2 (Corollary [T]) 77us is triwai following the proof of Theorem [7] since the matrix square root 

' Li 



of A can be written as L 



L 2 



Proof 3 (Proposition [TJ) For part (i), the matrix decomposition model implies that E(Xij) = Ui + 
flj + ipk.i for k — 1,2 depending on the class of array j. Each element of the noise as defined in Step 
2 can be written as Nij = Xij — fij — Vi — ipk,i where ftj — — y"^_ n Xg , i>i — ^ Ej-i P^»j — Aj); ara( ^ 
■i/jfc^ = — 5^ 3 - gC (Xij — fij — vC). We show that E(-/Vy) = 0, which in the process proves that E(Xij) = ij)k,i- 



1 " 1 

c(Aj + Pi + $w) = - E E ( X ^) + zr E E ( Xi ^) 



i i 

n rik 



EE B <*« 

i=l " jeC fc " j£C fc i=l 

+ + V^fc + fik + Ui — ip k ^ — p, k — V — <0fe 
Mj + f< + H>k,i- 



Now for part (n), X-S = N = E 2 N A 2 , and following the proof of part (i), N ~ iV m , n (0, 0, E, A). 
27ie characteristic function of the centered matrix-variate normal is </>x(Z) = etr(— | Z T E Z A) where etr 
is t/ie exponential of the trace function (Gupta and Nagar 1999). The characteristic function o/N can then 
be written as 



,(Z) = etr 



E E 2 Z A 



etr 



etr 



--|S - Z A 



-A 2 Z T E 2 EE 2 Z A 2 A 



2 



EE 



Z A 2 A A 



Thus, letting E = E 2 E E 2 and A 



A A 



we have N ~ N m ,n(0, 0, E, A). 



Proof 4 (Proposition [2]) Since we are considering the test statistic for one gene, we will suppress 
the index i. We can define the random variables Z = [X\ — X%) / 'a^fc^ and D = s^. ^Jo 2 . Then, T 
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can be written as f = Z/V~D. From Theorem^ Z ~ N((ipi — ^>2)/c\/cm, i]/c n ). Then, under the null, 
Z/Vv/ C n ~ -W(0, 1). j4Zso, under the null, Da 2 /a 2 ~ X(n-2) ™^ ^ an <^ independent. Then, 
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